Reconstructing False Start Errors in Spontaneous Speech Text

نویسندگان

  • Erin Fitzgerald
  • Keith B. Hall
  • Frederick Jelinek
چکیده

This paper presents a conditional random field-based approach for identifying speaker-produced disfluencies (i.e. if and where they occur) in spontaneous speech transcripts. We emphasize false start regions, which are often missed in current disfluency identification approaches as they lack lexical or structural similarity to the speech immediately following. We find that combining lexical, syntactic, and language model-related features with the output of a state-of-the-art disfluency identification system improves overall word-level identification of these and other errors. Improvements are reinforced under a stricter evaluation metric requiring exact matches between cleaned sentences annotator-produced reconstructions, and altogether show promise for general reconstruction efforts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Spontaneous Speech Recognition for Punjabi Language Interview Speech Corpus

Automatic Speech Recognition presents natural phenomena for the communication among man and machine. The purpose of Speech Recognition speech system is to convert the sequence of sound units in the form of text description. The main objective of the research work is to develop the automatic spontaneous speech model for the Punjabi language. Punjabi is categorized as a constituent of the Indo-Ar...

متن کامل

Disfluency patterns in dialogue processing

Spontaneous speech abounds with disfluencies such as filled pauses, repairs, repetitions, false start and prolongations, all of which are significant but easily overlooked features of speech communication. Based on the comparable corpora of English and Japanese dialogues, we argue that disfluency features can have a positive effect on turn-taking issues and the establishment of common referring...

متن کامل

Listeners' ERP responses to false starts and repetitions in spontaneous speech

Hindle [1] suggested that false starts and repetitions should be handled differently in a computational account of the processing of the two kinds of disfluency, and there is behavioural evidence that the human sentence processing mechanism likewise honours this distinction [2]. The same dichotomy was also evident in the electrophysiological data reported here. False starts and repetitions were...

متن کامل

Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines

This paper presents two different approaches utilizing statistical language model (SLM) and support vector machines (SVM) for sentence boundary detection of spontaneous Japanese. In the SLM-based approach, linguistic likelihoods and occurrence of pause are used to determine sentence boundaries. To suppress false alarms, heuristic patterns of end-of-sentence expressions are also incorporated. On...

متن کامل

Syntactic annotation of spontaneous speech: application to call-center conversation data

Both frameworks are based on the automatic semantic analysis of Human-Human spoken conversations. The semantic interpretation of a spoken utterance can be split into a two-level process: a tagging process projecting lexical items into basic conceptual constituents and a composition process that takes as input these basic constituents and combine them in a possibly complex semantic interpretatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009